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ABSTRACT 
This  paper  gives  general  conditions  for  convergence  rates  and  asymptotic  normality  of 
series  estimators  of  conditional  expectations,  and  specializes  these  conditions  to 
polynomial  regression  and  regression  splines.     Both  mean-square  and  uniform  convergence 
rates  are  derived.     Asymptotic  normality  is  shown  for  nonlinear  functionals  of  series 
estimators,  covering  many  cases  not  previously  treated.     Also,  a  simple  condition  for 
v'n-consistency  of  a  functional  of  a  series  estimator  is  given.     The  regularity  conditions 
are  straightforward  to  understand,  and  several  examples  are  given  to  illustrate  their 
application. 
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1.        Introduction 

Nonparametric  estimation  is  useful  in  many  econometric  applications.     Models  often 
depend  on  a  conditional  expectation     E[y|x]     that  is  unknown.     For  example,   in  demand 
analysis,   one  could  model  the  demand  function  as  a  conditional  expectation  with  unknown 
functional  form  (e.g.   see  Deaton,   1988,   or  Hausman  and  Newey,   1995).     One  way  to  estimate 
E[y|x]     is  by  least  squares  regression  on  approximating  functions,  referred  to  here  as 
series  estimation.     Series  estimation  is  convenient  for  imposing  certain  restrictions  on 
E[y|x],     such  as  additive  separability  (e.g.  see  Stone,  1985,  or  Andrews  and  Whang, 
1990).     Also,  it  is  computationally  convenient,  because  the  data  is  summarized  by  a 
relatively  few  estimated  coefficients. 

Large  sample  properties  of  series  estimators  have  been  derived  by  Stone  (1985),   Cox 
(1988),  Andrews  and  Whang  (1990),  Eastwood  and  Gallant  (1991),   Gallant  and  Souza  (1991), 
Newey  (1988,  1994a,  b,  c).     This  paper  extends  previous  results  on  convergence  rates  and 
asymptotic  normality  in  several  ways.     These  extensions  include  convergence  rates  that 
are  faster  than  some  published  ones  (e.g.  such  as  Cox,  1988)  or  have  weaker  side 
conditions  than  current  results  (e.g.  than  Newey,   1994a).     Also,   asymptotic  normality  is 
shown  for  regression  splines  and/or  nonlinear  functionals  that  are  not  covered  by  the 
results  of  Andrews  (1991),  and  a  simple  primitive  condition  for  v'n-consistency  is  given. 
In  addition,  the  results  are  derived  under  an  upper  bound  on  the  growth  of  number  of 

terms     K     that  is  less  restrictive  than  previous  results  (e.g.   those  of  Andrews  1991  or 

2  3 

Newey  1994),  requiring  only     K  /n  — >  0     for  regression  splines  and     K  /n  — >  0     for  power 

series  estimators  of  linear  functionals. 

To  describe  a  series  estimator,   let     gn(x)  =  E[y|x]     denote  the  true  conditional 

expectation  and     g     denote  some  function  of     x.     Also,  consider  a  vector  of  approximating 

functions 

K 

p   (x)  =  (P1K(x) Pkk(x))''  (1) 


having  the  property  that  a  linear  combination  can  approximate     gn(x).     Let     (y.,x.), 
(i  =  1,   ...,   n)     denote  the  data.     A  series  estimator  of     gn(x)     is 

g(x)  =  pK(x)'/3,     0  =  (P'PfP'Y,     P  =  [p^x^ PK(xn)]'.  (2) 


where     B       denotes  any  symmetric  generalized  inverse.     Under  conditions  given  below,     P'P 
will  be  nonsingular  with  probability  approaching  one,  and  hence     (P'P)       will  be  the 
standard  inverse. 

Series  estimators  are  convenient  for  imposing  certain  types  of  restrictions.     For 
example,  suppose  that     E[y|x]     is  additively  separable  in  two  subvectors     x       and     x,      of 

cL  0 

x,     so  that 


E[y|x]  =  ga(xa)  +  gb(xb).  (3) 

This  restriction  can  be  imposed  by  including  in     p   (x)     functions  that  depend  either  on 
x       or     x,  ,     but  not  on  both.     Additivity  of     g(x)     then  results  from  it  being  a  linear 

3.  D 

combination  of  these  functions  of  the  separate  arguments.     Another  example  is  a  partially 
linear  model,  where  one  of  the  components  is  a  linear  function,  as  in 

E[y|x]  =  xarQ  +  gb(xb)  (4) 

This  restriction  can  be  imposed  by  including  only     x       and  functions  of     x,      in     p   (x). 

cL  D 

The  value  of  imposing  either  of  these  restrictions  is  that  efficiency  improvements 

result,   in  the  sense  that  convergence  rates  are  faster.     The  partially  linear  model  is 

particularly  convenient  when     x       has  a  large  number  of  elements,  e.g.   when     x       consists 

a  a 

of  categorical  variables.     We  explicitly  consider  both  types  of  restrictions  in  this 
paper. 

There  are  several  different  types  of  series  estimators  that  could  be  used,  such  as 


Fourier  series,  power  series,   and  splines.     Fourier  series  are  not  considered  here 
because,  as  in  Andrews  (1991),   it  is  difficult  to  derive  primitive  conditions  except  when 
g  (x)     is  periodic,   which  is  not  relevant  for  most  econometric  applications.       Primitive 
conditions  will  be  given  in  Sections  5  and  6  for  power  series  and  regression  splines. 


2.        Convergence  Rates 

The  results  will  follow  from  a  few  primitive,  easy  to  interpret  conditions.     The 
first  condition  is 


Assumption  1:     (y.,x  ) (y  ,x  )     are  i.i.d.   and     Var(y|x)     is  bounded. 


The  bounded  conditional  variance  assumption  is  difficult  to  relax  without  affecting  the 

1/2 
convergence  rates.     For  the  next  condition  let     IIBII  =  [trace(B'B)]  be  the  Euclidean 

norm  for  a  matrix     B.     Also,  let     X    be  the  support  of     x.. 

Assumption  2:     For  every    K     there  is  a  nonsingular  constant  matrix     B     such  that  for 

K  K  K        K 

P   (x)  =  Bp   (x);     i)     the  smallest  eigenvalue  of     E[P   (x)P   (x)']     is  bounded  away  from 

zero  uniformly  in     K     and;     ii)     There  is  a  sequence  of  constants     Cn(K)     satisfying 

supxeXIIPK(x)ll  £  <Q(K)     and     K  =  K(n)     such  that     CQ(K)2K/n  — »  0     as     n  -4  «. 


This  condition  imposes  a  normalization  on  the  approximating  functions,  bounding  the 
second  moment  matrix  away  from  singularity,  and  restricting  the  magnitude  of  the  series 
terms.     This  condition  is  useful  for  controlling  the  convergence  in  probability  of  the 
sample  second  moment  matrix  of  the  approximating  functions  to  its  expectation,  in 
the  Euclidean  norm.     Newey  (1988)  and  Andrews  (1991)  also  impose  bounds  on  the  smallest 

This  is  also  difficult  for  Gallant's  Fourier  flexible  form.     The  reasons  for  this 
difficulty  are  discussed  in  detail  in  Gallant  and  Souza  (1991). 


eigenvalue  of  the  second  moment  matrix.     Primitive  conditions  are  given  below  for 
regression  splines  and  power  series  when  the  density  of     x     is  bounded  away  from  zero, 

with     Cn(K)     equal  to     CVK     and     CK     respectively,  for  a  constant     C,     leading  to  the 

2  3 

restrictions     K  /n  — »  0     or     K  /n  — »  0     mentioned  in  the  introduction. 

For  controlling  the  bias  of  the  estimator  it  is  useful  specify  a  rate  of 

approximation  for  the  series.     To  do  so,  let     (A. A  )'    =  A     denote  a  vector  of 

nonnegative  integers,  having  the  same  dimension  as     x,     let      |A|    =  E-.A.,     and     d     be 
any  nonnegative  integer.     Define 

|A| 


|gld  =  max|;V|£dsupxg;rla       g(x)/9x  1«««axrr| 


Assumption  3:     For  an  integer     d  £  0     there  are     a,  £        such  that      lgn~P    'Pv'h  =  ^^     ' 
as     K  — »  m. 


This  condition  requires  that  the  uniform  approximation  error  to  the  function  and  its 
derivatives  up  to  order     d     shrink  at     K     .     The  integer     d     will  be  specified  below  as 
the  order  of  the  derivative  for  which  a  uniform  convergence  rate  is  derived.     The 
integer     a     is  related  to  the  smoothness  of  the  function     gn(x),     the  dimensionality  of 
x,     and  the  size  of     d.     For  example,  for  splines  and  power  series  and     d  =  0,     this 
assumption  will  be  satisfied  with     a  =  s/r,     where     s     is  the  number  of  continuous 
derivatives  of     gn(x)     that  exist  and    r     is  the  dimension  of     x.     It  should  also  be 
noted  that  to  derive  a  mean-square  convergence  rate  only  a  mean  square  approximation  rate 
is  needed  (i.e.     E[<g  (x)-p   (x)'P„}  ]  =  0(K       )),     rather  than  the  uniform  convergence 

u  K. 

rate  specified  here,  but  for  simplicity  we  do  not  make  this  generalization  explicit. 

To  state  the  general  convergence  rate  result,  some  notation  is  required.     Let     FQ(x) 

denote  the  cumulative  distribution  function  of     x.     and 

l 


Cd(K)  =  maxlx|sdsupxeXliaAPK(x)ll. 


It  will  be  assumed  throughout  that     Crf(K)  £  1     for  large  enough     K,     and  that     CH(K) 
exists  whenever  it  appears  in  the  assumptions.     The  following  result  gives  a  general 
result  on  mean-square  and  uniform  convergence  rates. 

Theorem  1:     If  Assumptions  1-3  are  satisfied  with     d  =  0     then 
S[g0(x)  -  g(x)]2dFQ(X)  =  Op(K/n  +  K2CL). 

Also,  if  Assumptions  1-3  are  satisfied  for  some     d  £  0     then 
\g  -  g0\d  =  Op«:d(K)[VR/Vn  +  K~a]). 


This  result  is  similar  to  Newey  (1994a),  except  that  the  requirement  that     K     does  not 

2 

depend  on  the  data  here  and  the  requirement  that     Cn(K)  K/n  — >  0     is  different.     The  mean 

square  error  result  is  different  than  Andrews  and  Whang  (1990),  in  applying  to  integrated 
mean  square  error,  rather  than  sample  mean  square  error,  and  in  imposing  Assumption  2. 
The  conclusion  for  mean-square  error  leads  to  optimal  convergence  rates  for  power 
series  and  splines,  i.e.  that  attain  Stone's  (1982)  bound.     The  term     K/n     essentially 
corresponds  to  a  variance  term  and     K  to  a  bias  term.     When     K     is  chosen  so  that 

these  two  terms  go  to  zero  at  the  same  rate,  which  occurs  when     K     goes  to  infinity  at 
the  same  rate  as     n         *""'"     (and  the  side  condition     Cn(K)  K/n  — >  0     is  satisfied),  the 

convergence  rate  will  be     n  .     For  power  series  and  splines,  where     a  =  s/r,     the 

-r/(r+2s) 
rate  will  be     n  ,     which  equals  Stone's  (1982)  bound. 

The  uniform  convergence  rates  will  not  be  optimal.     For  example,  for  splines  and     d 

1/2— s/r 
=  0,     the  corresponding  rate  will  be     K/v^i  +  K  ,     which  cannot  attain  Stone's 

(1982)  bound.     Nevertheless,  these  uniform  convergence  rates  improve  on  some  in  the 

literature,  e.g.  on  Cox  (1988).     Also,  it  is  not  yet  known  whether  it  is  possible  to 

attain  the  optimal  uniform  convergence  rates  using  a  series  estimator. 


3.        Asymptotic  Normality 

There  are  many  applications  where  a  functional  of  a  conditional  expectation  is  of 
interest.     For  example,   a  common  practice  in  demand  analysis  is  estimation  of  the  demand 
function  in  log  linear  form,  where     y     is  the  log  of  consumption,     x     is  a  two 

dimensional  vector  with  first  argument  equal  to  the  log  of  price  and  second  the  log  of 

g(x) 
income,  and  the  estimated  demand  function  is     e        .A  functional  of  interest  in  demand 

analysis  is  approximate  consumer  surplus,   equal  to  the  integral  of  the  demand  function 

over  a  range  of  prices.     For  a  fixed  income     I     an  estimator  of  this  functional  would  be 

6  =  Jjj  ei(lnt'lnI)  dt  .  (5) 

An  asymptotic  normality  result  for  this  functional  could  be  useful  in  constructing 
approximate  confidence  intervals  and  tests. 

This  Section  gives  conditions  for  asymptotic  normality  of  functionals  of  series 
estimates.     In  this  Section  we  focus  on  the  "slower  than  1/Vn"  case,  and  discuss  the 
i/n-consistent  case  in  the  next  Section.     To  describe  the  results,  let     a(  • )  be  a  vector 
functional  of     g,     i.e.  a  mapping  from  a  possible  conditional  expectation  function  to  a 
real  vector.     The  estimator  will  be  assumed  to  take  the  form 

9  =  a(£).  (6) 

For  example,  the  approximate  consumer  surplus  estimator  satisfies  this  equation  with 
a(g)  =       es       '       dt.     The  true  value  corresponding  to  this  estimator  will  be 

eQ  =  a(g0),  (7) 

where     gn     denotes  the  true  conditional  expectation  function. 

To  use     6     for  approximate  inference  procedures,  it  is  important  to  have  an 
asymptotic  variance  estimator.     Such  can  be  formed  from  a  delta-method  estimator  of  the 


variance  of     6     as  a  function  of  the  estimated  coefficients     £.     Let 

A  =  aa(PK'/3)/8pi    g, 

when     A     exists,  and  otherwise  let     A     be  any  vector  with  the  same  dimension  as     £.     The 
regularity  conditions  given  below  will  be  imply  that     A     exists  with  probability  that 
approaches  one  in  large  samples.     Let 

V  =  A'Q~IQ~A,     Q  =  P'P/n,     t  =  E^p^x-lp^x.)'  [y.-g(x.)]2/n.  (8) 

This  estimator  is  just  the  usual  one  for  a  nonlinear  function  of  least  squares 
coefficients.     The  vector     A     is  a  Jacobian  term,  and     Q  ZQ       is  the  White  (1980) 
estimator  of  the  least  squares  asymptotic  variance  for  a  possibly  misspecified  model. 
This  estimator  will  lead  to  correct  asymptotic  inferences  because  it  accounts  properly 
for  variance,  and  because  bias  will  be  small  relative  to  variance  under  the  regularity 
conditions  discussed  below. 

Some  additional  conditions  are  important  for  the  asymptotic  normality  result. 

4 
Assumption  4:     E[{y-g_(x)>    |x]     is  bounded,  and     Var(ylx)     is  bounded  away  from  zero. 

This  assumption  requires  that  the  fourth  conditional  moment  of  the  error  is  bounded, 
strengthening  Assumption  1.     The  next  one  is  a  smoothness  condition  on     a(g),     requiring 
that  it  be  approximated  sufficiently  well  by  a  linear  functional  when     g     is  close  to 

% 

Assumption  5:     Either     a)     a(g)     is  linear  in     g,     or;     b)     For     d     as  in  Assumption  3, 

4   2 
£  ,(K)  K  /n  — >  0     and  there  exists  a  function     D(g;g)     that  is  linear  in     g     and  such  that 

for  some     C,  c  >  0     and  all     g,  g     with     lg-gnl  ■  <  e,      lg-g0l  ,  <  e,     it  is  true 

that     lla(g)-a(i)-D(g-i;i)ll  £  C(|g-ild)2     and     IID(g;g)-D(g;g)ll  s  L I g I d I g-g  I d- 

The  interpretation  of     D(g;g)     is  that  it  is  a  functional  derivative  of     a(g).     Indeed, 


this  assumption  implies  that     a(g)     is  Frechet  differentiate  in     g     with  respect  to  the 
norm      I  g  I  .. 

This  condition  is  often  straightforward  to  verify.     In  the  consumer  surplus  example 
it  is  easy  to  check  that  it  will  be  satisfied  with     d  =  0     and 

D(g;i)  =  JPg(lnt,lnT)ei(lnt'lnT)dt,  (9) 

as  long  as     [lnE,lnp]xOnT>     is  contained  in  the  support  for     x.     To  see  that  this  is  so, 

note  that  for     lg-gQl0  <  e     and     lg-g0lQ  <  e,     g     and     g    will  be  uniformly  bounded  on 

i  z       i        z 
the  range  of  integration.     Also,  for  any  scalar     z,     dJe  /dz    =  e  ,     so  that  a  mean-value 

z  ~  zzz~  ~  2 

expansion  of     e       around  some  other  point     z     gives     |e  -e  -e  (z-z)|   £  C|z-z|       for  some 

constant     C     when     z     and     z     are  in  some  bounded  set.     Therefore, 

lla(g)-a(g)-D(g-g;g)ll  =£  /Pc|g(lnp,lnT)-i(lnp,lnI)|2dp  *  C(p-E)(  lg-ilQ)2 


The  next  requirement  imposes  some  continuity  conditions  on  the  derivative.     Let 
D(g)  =  D(g;gn)     denote  the  derivative  at     g  =  gn. 

Assumption  5:     a(g)     is  a  scalar,  there  exists     C     such  that     |D(g)|    ^  C  |  g  I  ,     for     d 

K        ~  2 

from  Assumption  3,  and  there  exists     g„(x)  =  p   (x)'/3..     such  that     E[g-.(x)  ]  — >  0     and 

K.  K.  K. 

D(g,.)     is  bounded  away  from  zero. 


This  assumption  says  that  the  derivative  is  continuous  in     lglH.     but  not  in  the 

2    1/2 

mean-square  norm     (E[g(x)   ])       .     The  lack  of  mean-square  continuity  will  imply  that  the 

estimator     6     is  not  v'n-consistent,  and  is  also  a  useful  regularity  condition.     Another 
restriction  imposed  is  that     a(g)     is  a  scalar,  which  is  general  enough  to  cover  many 
cases  of  interest.     When     a(g)     is  a  vector  asymptotic  normality  would  follow  from 
conditions  like  those  of  Andrews  (1991),  which  are  difficult  to  verify.     In  contrast, 
Assumption  5  is  a  primitive  condition,  that  is  relatively  easy  to  verify. 

For  example,  for  the  consumer  surplus  estimator  previously  discussed  where     p   (x) 


is  a  power  series,   suppose  that     x     is  continuously  distributed  with  compact  support  and 
bounded  density,  and  that  the  set     T  =  <(lnp,lnT)  :  £  s  p  i  p)     is  contained  in  this 
support.     Then  there  exists  a  sequence  of  continuous  functions     g.(x)     such  that     g.(x) 
is  equal  to     l/(p-p_)     on  the  line     T,     is  uniformly  bounded,  and  converges  to  zero 
everywhere  else.     For  this  sequence,     D(g.)  =  f  g.(lnp,lnT)exp[g  (lnp,lnT)]dp  £ 
CJ"  g.dnp.lnljdp  =  C     for  a  constant     C  >  0     and     E[g.(x)  ]  — »  0.     By  the  Weirstrass 

approximation  theorem,  any     g.(x)     can  be  approximated  uniformly  by  a  polynomial     q,(x), 

2 
so  that     D(q.(x))  >  C/2,     and     E[q.(x)  ]  — »  0.     Since  a  polynomial  is  a  linear 

combination  of  a  power  series  it  follows  that  Assumption  5  is  satisfied  in  this  example. 

To  state  the  asymptotic  normality  result  it  is  useful  to  work  with  an  asymptotic 

2 
variance  formula.     Let     <r  (x)  =  Var(ylx)     and 

A  =  (D(p1K),...,D(pKK))'. 


The  asymptotic  variance  formula  is 

VK  =  A'Q^SQ^A,     Q  =  E[pK(x)pK(x)'],     I  =  E[pK(x)pK(x)'<r(x)2].         (10) 


Theorem  2:     If  Assumptions  1-6  are  satisfied  and     VnK       — »  0     then     Q  =  8_  + 


0  (C,JK)/Vn)    and 
P    d 


VnV~Kl"(e  -  eQ)  -i*  N(0,1),    VnV  u'(e  -  eQ)  -!>  N(0,1). 


This  theorem  includes  as  a  special  case  linear  functionals,  which  were  studied  by  Andrews 

(1991).     In  that  case  the  conditions  of  Theorem  2  are  quite  simple  relative  to  Andrews 

2 

(1991)  conditions.     Also,  the  restriction     CJK)  K/n  — >  0     is  weaker  than  that  of  Andrews 

(1991),   leading  to  only  requiring  that     K/n  — >  0     for  power  series,  rather  than     K/n  — » 
0     in  Andrews  (1991).     One  reason  for  this  contrast,  is  that  the  regressors  are  assumed 
to  be  i.i.d.  here,  while  Andrews  (1991)  general  results  allows  for  the  regressors  to  not 
be  identically  distributed. 


10 


This  result  only  gives  an  upper  bound     O  {C,  ,(K)/v/n)     on  the  convergence  rate  for 
6.     This  bound  may  not  be  sharp.     We  know  it  will  not  be  sharp  in  the  v'n-consistent  case 
considered  next,  where     d  =  0.     Whether  it  is  sharp  for  other  cases  is  still  an  open 
question. 


4.       v'n-Consistency 

The  key  condition  for  v'n-Consistency  is  that  the  derivative     D(g)     be  mean-square 
continuous,  as  specified  in  the  following  hypothesis. 

Assumption  6:     There  is     i>(x)     with     E[v(x)v(x)']     finite  and  nonsingular  such  that 
D(gQ)  =  E[v(x)g0(x)],     DCpj^)  =  EMxJp^fx)]     for  all     k     and     K,     and  there  is 
PK     with     E[lMxH3KpK(x)ll2]  -»  0. 

This  condition  allows  for     a(g)     to  be  a  vector.     It  requires  a  representation  of 
a(g)     as  an  expected  outer  product,  when     g     is  equal  to  the  truth  or  any  of  the 
approximating  functions,  and  for  the  functional     i>(x)     in  the  outer  product 
representation  to  be  approximated  in  mean-square  by  some  linear  combination  of  the 
functions.     This  condition  and  Assumption  5  are  mutually  exclusive,  and  together  cover 
most  cases  of  interest  (i.e.  they  seem  to  be  exhaustive). 

A  sufficient  condition  for  Assumption  6  is  that  the  functional     a(g)     be  mean-square 
continuous  in     g     over  some  linear  domain  that  includes  the  truth  and  the  approximating 
functions,  and  that  the  approximation  functions  form  a  basis  for  this  domain.     The  outer 
product  representation  in  Assumption  6  will  then  follow  from  the  Riesz  representation 
theorem.     This  condition  is  somewhat  like  Van  der  Vaart's  (1991)  condition  for 
v'n-consistent  estimability  of  functionals,  except  that  his  mean-square  continuity 
hypothesis  pertains  to  the  set  of  scores  rather  than  a  set  of  conditional  expectations. 
Also,  here  it  is  a  sufficient  condition  for  Vn-consistency  of  a  particular  estimator, 
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rather  than  a  necessary  condition  for  existence  of  such  an  estimator.     A  similar 
condition  was  also  used  by  Newey  (1994b)  to  derive  primitive  conditions  for 
v'n-consistency  of  series  estimators,  under  stronger  regularity  conditions. 

There  are  many  interesting  examples  of  functionals  that  satisfy  this  condition.     One 
example  is  an  average  consumer  surplus  estimator,   like  that  of  equation  (5),  but 
integrated  over  income.     Consider  the  functional 

a(g)  =  jJvmjfexpCgUnp.lnnjdpdl,  (11) 

where     v(I)     is  some  weight  function,  with    JMDdl  =  1.     This  is  an  average  of  consumer 
surplus  over  different  income  values.     An  estimator  of  this  functional  could  be  used  as 
a  summary  of  consumer  surplus  as  a  function  of  income.     Similarly  to  equation  (10),  the 
derivative  of  this  functional  is 


D(g)  =  Jv(I,p)g(lnp,lnI)dpdI,  v(I,p)  =  l(E£p£p,lsl<l)v(I)exp(g0(lnp,lnD).     (12) 

Assumption  6  will  then  be  satisfied,  with     v(x)  =  f(I,p)    v(I,p),     where     f(I,p)     is  the 
density  for     I     and     p,     assumed  to  be  bounded  away  from  zero  on     [p_,p]  x  [1,1]. 

Another  example  is  the  coefficients     yn     from  the  partially  linear  model  of  equation 
(4).     Suppose  that     E[Var(x   |x,  )]     is  nonsingular,  an  identification  condition  for     y 

3.  0  U 

and  let     U  =  x     -  E[x   |xj     and     v(x)  =  (E[UU'  ])_1U.     Then  for     a(g)  =  EMx)g(x)], 
a  a     b 

a(g())  =  E[i>(x)g0(x)]  =  (EIUU'  ])-1(E[Ux^]y0  +  E[Ugb(xb)])  =  yQ.  (13) 


In  this  example     a(g)     is  a  linear  functional  of     g,     and     a(g)  =  D(g)     satisfies 
Assumption  6  by  construction. 

A  third  example  is  a  weighted  average  derivative,  where 

a(g)  =  .Tw(x)[3g(x)/3x]dx,     /w(x)dx  =  1,     w(x)  a  0.  (14) 

This  functional  is  useful  for  estimating  scaled  coefficients  of  an  index  model,   as 
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discussed  in  Stoker  (1986),   and  can  also  be  used  to  quantify  the  average  slope  of  a 
function.     Assuming  that     w(x)     is  zero  outside  some  compact  set  and  that     x     is 
continuously  distributed  with  density     f(x)     that  is  bounded  away  from  zero  on  the  set 
where     w(x)     is  positive,   integration  by  parts  gives 

a(g)  =  -J*[aw(x)/ax]g(x)dx  =  EMx)g(x)],     vlx)  =  -f(x)_13w(x)/3x.     (15) 

Therefore,  Assumption  6  is  satisfied  for  this  functional,  and  hence  Theorem  3  below  will 
give  v'n-consistency  of  a  series  estimator  of  this  functional. 

The  asymptotic  variance  of  the  estimator  will  be  determined  by  the  function     v(x) 
from  Assumption  6.     It  will  be  equal  to 

V  =  EMx)Wx)'Var(ylx)]. 

Theorem  3:     If  Assumptions  1-4  and  6  are  satisfied  for     d  =  0,     and     VnK       — >  0     then 
Vn(G  -  9Q)  -U  N(0,  V),     V  -^  V. 


5.        Power  Series 

One  particular  type  of  series  for  which  primitive  regularity  conditions  can  be 
specified  is  a  power  series.     Suppose  that     r     is  the  dimension  of     x,     let     X  = 

(X. X  )'     denote  a  vector  of  nonnegative  integers,  i.e.  a  multi-index,  with  norm     |i\| 

r  X  r        X 

~  Z-_i^->     and  let     z     =  TI.     (z.)    .     For  a  sequence     (A(k))„         of  distinct  such 
J— *   J  J— 1    J  *-- 1 

vectors,  a  power  series  approximation  has 

PkK(x)  =  xMk).  (16) 

It  will  be  assumed  that  the  multi-index  sequence  is  ordered  with  degree     I  A(K)  | 
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increasing  in     K. 

The  theory  to  follow  uses  orthonormal  polynomials,  which  may  also  have  computational 
advantages.     For  the  first  step,  replacing  each  power     x       by  the  product  of  orthonormal 
polynomials  of  order  corresponding  to  components  of     X,     with  respect  to  some 
distribution  may  lead  to  reduced  collinearity,  particularly  when  the  distribution  closely 
matches  that  of  the  data.     The  estimator  will  be  numerically  invariant  to  such  a 
replacement,  because      |  \(l)  I      is  monotonically  increasing. 

For  power  series  primitive  conditions  for  Assumptions  2  and  3  can  be  specified. 
Assumption  2  will  be  a  consequence  of  the  following  condition. 

Assumption  7:     The  support  of     x     is  a  Cartesian  product  of  compact  connected  intervals 
on  which     x     has  a  probability  density  function  that  is  bounded  away  from  zero. 

This  assumption  can  be  relaxed  by  specifying  that  it  only  holds  for  a  component  of  the 
distribution  of     x     (which  would  allow  points  of  positive  probability  in  the  support  of 
x),     but  it  appears  difficult  to  be  more  general. 

To  be  specific  about  Assumption  3  we  need  uniform  approximation  rates  for  power  series. 
These  rates  will  follow  from  the  following  smoothness  assumption. 

Assumption  8:     gn(x)  =  E[y|x]     is  continuously  differentiate  of  order     s     on  the  support 
of     x. 

It  follows  from  this  condition  and  Lorentz  (1986)  that  for     d  =  0,     the  approximation 
rate  of  Assumption  3  is     a  =  s/r,     where     r     is  the  dimension  of     x.     A  literature 
search  has  not  yet  revealed  a  corresponding  result  for     d  >  0,     where  derivatives  are 
also  approximated,  although  it  is  known  that  if     E[y|x]     is  analytical  Assumption  3  will 
hold  for     a     any  positive  number.     Because  of  this  absence  of  approximation  rates  for 
derivatives,  the  rest  of  the  Section  will  confine  discussion  to  the  case  where    d  =  0. 
The  first  result  for  polynomials  gives  convergence  rates. 
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Theorem  4:     If  Assumptions  1,  7,  and  8  are  satisfied  and     hf/n  — >  0     then 

Slg0(x)  -  g(x)]2dFQ(x)  =  0  (K/n  +  K2s/r),      \g  -  gQ\Q  =  0(K[VR/Vn  +  K~s/r]). 


This  result  gives  a  mean-square  and  uniform  approximation  rates  for     g.     Uniform 
convergence  rates  for  derivatives  are  not  given  for  the  reason  discussed  above,  that 
approximation  rates  for  derivatives  are  not  readily  available  in  the  literature. 

As  previously  noted,  the  mean-square  approximation  rate  attains  Stone's  (1982) 

r/(r+2s)  3 

bounds  for     K  =  Cn  J~       and     s  £  r     (implying     K  /n  — >  0).     The  mean-square  rate 

obtained  here  is  different  than  that  of  Andrews  and  Whang  (1990),  because  it  is  for  the 

population  mean-square  error  rather  than  the  sample  mean-square  error,  which  is  appealing 

because  it  a  fixed  norm  rather  than  one  that  changes  with  the  sample  size  and 

configuration  of  the  regressors.     On  the  other  hand,  Assumption  7  is  stronger  than  the 

conditions  imposed  in  Andrews  and  Whang  (1990). 

The  uniform  approximation  rate  does  not  appear  to  be  optimal,  although  it  seems  to 

improve  on  rates  existing  in  the  literature,  being  faster  than  that  of  Cox  (1988).     One 

3 
implication  of  this  rate  is  that  g    will  be  uniformly  consistent  when     s  £  r     and     K  /n 

— >  0. 

This  result  can  also  be  extended  to  additive  models,  such  as  the  one  in  equation 

(3).     A  power  series  estimator  could  be  constructed  as  described  above,  except  that 

products  of  powers  from  both     x       and     x,      are  excluded.     If  each  of     g  (x  )     and     g,  (x,  ) 

ab  a    abb 

are  continuously  differentiable  of  order     s,     the  exclusion  of  the  (many)  interaction 

— s/r  — 

terms  will  increase  the  approximation  rate  to     K         ,     where     r     is  the  maximum  dimension 

of     x       and     x,  .     The  conclusion  of  Theorem  4  will  then  hold  with     r     replaced  by     r. 
a  b 

This  gives  mean-square  convergence  rates  for  polynomials  like  the  regression  spline  rates 
of  Stone  (1985).     An  extension  to  the  partially  linear  case  would  also  be  similar,  with 
r     replaced  by  the  dimension  of     x,      and  Assumption  7  only  required  to  hold  for     x, 
rather  than     x.     Because  these  extensions  are  straightforward,   explicit  results  are  not 
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given. 

Assumptions  7  and  8  are  primitive  conditions  for  Assumptions  2  and  3,   so  that 
asymptotic  normality  and  v'n-consistency  results  will  follow  under  the  other  conditions 
of  Sections  3  and  4.     All  of  these  other  conditions  are  primitive,   are  straightforward 
to  verify,  except  for  Assumption  5.     The  following  more  primitive  version  of  Assumption  5 
is  easier  to  check  and  will  suffice  for  Assumption  5  for  power  series. 

Assumption  9:     a(g)     is  a  scalar  and  there  exists  a  sequence  of  continuous  functions 

00  ? 

{gjfx)}..     such  that     a(g  )  is  bounded  away  from  zero  but     E[g.(x)  ]  — >  0. 

An  asymptotic  normality  result  for  power  series  estimators  is  given  by  the  following: 

Theorem  5:     If  Assumptions  1  and  4  are  satisfied  for     d  =  0,     Assumptions  7-9  are 

satisfied,     VnK     '      — »  0,     either     K  /n  — >  0     or     a(g)     is  linear  and     ¥T /n  — >  0,     then 

9  =  9n  +  0  (K/Vn)     and     VnV~2/2(6  -  6n)  -U  N(0,1). 
up  u 

The  rate  conditions  of  this  result  require  that     s  >  3r/2     in  the  case  of  a  linear 
functional,  or  that     s  >  3r     in  the  case  of  a  nonlinear  functional.     In  this  sense  a 
certain  amount  of  smoothness  of  the  regression  function  is  required  for  asymptotic 
normality. 

For  the  consumer  surplus  example,  it  has  already  been  shown  that  Assumptions  4  and  9 
are  satisfied.     Since  it  is  nonlinear,  and  since  the  dimension  of     x     is     2     in  this 
case,  asymptotic  normality  will  follow  from  Assumptions  7  and  8  and  from     K  /n  — •>  0     and 
tfft-s/2  ->  0. 

A  result  can  also  be  formulated  for  v'n-consistency  of  a  functional  of  a  polynomial 
regression  estimator. 

Theorem  6:     If  Assumptions  1  and  4  are  satisfied  for     d  =  0,     Assumptions  6-8  are 
satisfied,     VnK  — >  0,     either    K  /n  — >  0    or    a(g)     is  linear  and     YT /n  — >  0,     then 

Vn(B  -  Bn)  -^  N(0,V)     and     V  — >  V. 
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The  conditions  of  this  result  are  very  similar  to  those  of  Theorem  5,  except  that 
Assumption  9  has  been  replaced  by  Assumption  6.     More  primitive  conditions  for  Assumption 
6  are  not  given  because  it  is  already  in  a  form  which  is  straightforward  to  verify,  as 
demonstrated  by  the  examples  of  Section  4.     The  average  consumer  surplus  estimator 

satisfies  Assumptions  4  and  6  with     d  =  0,     so  that  under  Assumptions  7  and  8  and     K  /n 

— s/2 
— >  0     and     v^nK       "  — >  0     it  will  be  v'n-consistent.     Also,  the  partially  linear 

coefficients  and  weighted  average  derivative  examples  also  satisfy  the  assumptions,  and 

are  linear  functionals,  so  that  they  will  be  ViT-consistent  if     K  /n  — >  0     and     VnK  — -» 


6.        Regression  Splines 

Regression  splines  are  linear  combinations  of  functions  that  are  smooth  piecewise 
polynomials  of  a  given  order  with  fixed  knots  (join  points).     Spline  approximating 
functions  have  attractive  features  relative  to  polynomials,  being  less  sensitive  to 
outliers  and  to  bad  approximation  over  small  regions.     They  have  the  disadvantage,  as  far 
as  the  theory  here  is  concerned,  that  support  of     x     must  be  known.     A  known  support  is 
required  for  knot  placement.     Alternatively,  the  results  could  also  be  obtained  for 
estimators  that  only  use  data  where     x     is  in  some  specified  region,  but  the  conclusion 
would  have  to  be  modified  for  that  case.     In  particular,  the  uniform  and  mean-square 
convergence  rates  would  only  apply  to  the  true  function  over  the  region  of  the  data  that 
is  used,  and  asymptotic  normality  would  only  hold  for  functionals  that  did  not  depend  on 
the  function    g       for     x     outside  the  region.     For  simplicity,  this  section  will  only 
consider  the  case  where  the  support  is  known  and  satisfies  Assumption  7,  where  the 
following  condition  can  be  imposed  without  loss  of  generality. 

Assumption  10:     The  support  of     x     is     [-1,1]  . 
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When  the  support  of     x     is  known  and  Assumption  7  is  satisfied,     x     can  always  be  rescaled 

so  that  this  condition  holds. 

To  describe  a  spline,  let     (x)       =  l(x  >  0)*x.     A  spline  basis  for  a  univariate  m 

degree  polynomial  with     L-l     knots  is 

£-1 
r 


1st*  m+1, 

(17) 


{[x  +   1  -  2(£-m-l)/L]}m,        m+2  £  £  s  m+L 

A  spline  basis  can  be  formed  from  products  of  these  functions  for  the  individual 
components  of     x.     For     {A(k,K)>     the  set  of  distinct  r-tuples  of  nonnegative  integers 
with     A.(k,K)  s  m+J     for  each     j     and    £,     and     K  =  (m+J)  ,     let 

pkK(x)  =  nj=iPx  (k)(xj'L)>   (k=1*  —  K)-  (18) 

The  theory  to  follow  uses  B-splines,  which  are  a  linear  transformations  of  the  above 
functions  that  have  lower  multicollinearity.     The  low  multicollinearity  of  B-splines  and 
recursive  formula  for  calculation  also  leads  to  computational  advantages;   e.g.   see  Powell 
(1981). 

The  rate  at  which  splines  uniformly  approximate  a  function  is  the  same  as  that  for 
power  series,  so  the  smoothness  condition  of  Assumption  8  will  be  left  unchanged.     For 
splines  there  does  not  seem  to  be  approximation  rates  for  derivatives  readily  available 
in  the  literature,  so  we  limit  attention  to  the  case     d  =  0,     as  we  do  for  power  series. 

Theorem  7:     If  Assumptions  1,  7,  8,  and  10  are  satisfied  then 

S[gQ(x)  -  g(x)]2dF0(x)  =  0  (K/n  +  K~2s/r),      \g  -  gQ\Q  =  0(VK[VR/Vn  +  K~s/r]). 

It  is  interesting  to  note  that  the  uniform  convergence  rate  for  splines  is  faster  than 
power  series.     It  may  be  that  this  is  an  artifact  of  the  proof,  rather  than  some 
intrinsic  feature  of  the  two  types  of  approximation,  but  more  work  is  needed  to  determine 
that. 
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Asymptotic  normality  and  consistency  will  follow  similarly  as  for  power  series. 

Theorem  8:     If  Assumptions  1  and  4  are  satisfied  for     d  =  0,     Assumptions  7-10  are 

satisfied,     VnK  "'      — >  0,     either     K  /n  — >  0     or     a(g)     is  linear  and     ¥T /n  — >  0,     then 

9  =  Qn  +  0  (K/Vn)     and     VnV~1/2(9  -  9n)  -±>  N(0,1). 
Up  u 

In  comparison  with  Theorem  5,  this  asymptotic  normality  result  for  functionals  of  a 
spline  estimator  imposes  less  stringent  conditions  on  the  growth  rate     K.     Consequently, 
the  necessary  smoothness  conditions  for  asymptotic  normality  will  be  less  severe, 
requiring  only     s  >  2r     for  a  nonlinear  functional  or     s  >  r     for  a  linear  one.     As  in 
the  case  of  polynomials,   the  consumer  surplus  functional  satisfies  the  conditions  of  this 
result,  so  that  it  will  be  asymptotically  normal  for     K  /n  — =►  0     and    v'nK    '  "  — >  0. 
A  vlT-consistency  result  for  splines  can  also  be  formulated. 

Theorem  9:     If  Assumptions  1  and  4  are  satisfied  for     d  =  0,     Assumptions  6-8  and  10 
are  satisfied,     VnK     '      — »  0,     either    K  /n  — »  0    or    a(g)     is  linear  and     K  /n  — »  0, 
then     Vn(Q  -  6n)  -U  N(0,V)     and     V  — >  V. 


The  examples  of  Section  4  meet  the  conditions  of  this  result,   and  so  for  regression 

splines,  the  average  consumer  surplus  estimator  will  be  v'n-consistent  if     K  /n  — >  0     and 

— s/2 
v'nK      ""  — >  0,     and  the  partially  linear  coefficients  and  weighted  average  derivative  will 


2  — s/r 

be  Vn-consistent  if     K  /n  — »  0     and    VnK  — >  0. 


7.        Conclusion 

This  paper  has  derived  convergence  rate  and  asymptotic  normality  results  for  series 
estimators.     Primitive  conditions  were  given  for  power  series  and  regression  splines. 
Further  refinements  of  these  results  would  be  useful.     It  would  be  good  to  have  results 
for  estimation  of  derivatives  of  functions.     These  could  easily  be  obtained  from 
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approximation  rates  for  power  series  or  splines.     Also,   it  would  be  useful  to  know 
whether  the  uniform  convergence  rates  could  be  improved  on. 
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Appendix:  Proofs  of  Theorems 

Throughout  the  Appendix,  let     C     denote  a  generic  constant  that  may  be  different  in 
different  uses  and     J\  =  £•_,•     Also,  before  proving  Theorems  1-3,   it  is  helpful  to  make 
the  following  observations  that  apply  to  each  of  those  proofs.     Since     g     is  invariant  to 

nonsingular  linear  transformations  of  of     p   (x),     it  can  be  assumed  throughout  that     B  = 

K  K  K        K 

I,     i.e.     p   (x)  =  P    (x).     Furthermore,  since     Q  =  E[p   (x)p   (x)'  ]     has  smallest  eigenvalue 

bounded  away  from  zero,   it  can  be  assumed  throughout  that     Q  =  I.     This  is  because,  for  a 

,     -.-1/2       _     _-l      ,.-1/2  K,    ,     .  .... 

symmetric  square  root     Q  of     Q    ,     Q        p  (x)     is  a  nonsingular  linear 

transformation  of     p   (x)     satisfying 

~  A    -1/?   K 

<d(K)  =  supxeX(|A|£dll5Q        p   (x)ll   *CCd(K). 


Hence,  the  conditions  imposed  on     K     in  Assumptions  2  and  5  will  be  satisfied  when 

—1/2  K  K  ~ 

Q        p   (x)     replaces     p   (x),     and  convergence  rate  bounds  in  terms  of     CJK)     derived  for 

this  replacement  will  also  hold  for  the  original     CH(K). 

Proof  of  Theorem  1:     First,     E[  IIQ-III2]  =  E^Zj^EKEi^Pj^Cx.Jp^Cx^/n  -  Ijk>2] 

*  Ik=lEj=lE[PkK(x)2pjK(x)2l/n  =  E[^k=lPkK(x)25:j=lPjK(x)2l/n  "  <0(K)2tr(I)/n  "  VK)2K/n 
— >  0,     so  that 

IIQ-III   =  O  «.(K)K1/2/vn)  =  o  (1).  (19) 

P    0  p 

Furthermore,  since  the  difference  in  smallest  eigenvalues  is  bounded  in  absolute  value 

by     IIQ-III,     the  smallest  eigenvalue  of     Q     converges  in  probability  to  one.     Let     1       be 

the  indicator  function  for  the  smallest  eigenvalue  of     Q     being  greater  than     1/2,     so 

Prob(l     =  1)  — >  1. 
n 

Next,  for     G  =  (g^x^....^^))'      let     e  =  Y-G     and     X  =  (Xj xn). 

Boundedness  of  Var(y|x)     and  independence  of  the  observations  implies     E[ee'  |X]  ^  CI. 
Therefore, 
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E[l   IIQ  1/2P'£/nll2|X]  =  1  E[e'P(P'P)  VelXl/n  =  1  E[tr<P(P'P)  V'ee'HXl/n 
n  n  n 

=  1  tr<P(P'P)~lP'E[ce\X])/n  *  CI  tr<P(P'P)~ V' >/n  *  CK/n, 
n  n 

so  by  the  Markov  inequality,     1   IIQ"1/2P'e/nll   =  0  {K1/2/Vn).     It  follows  that 

1   IIQ_1P'e/nll   =  1  {(e'P/n)Q"1/2Q_1Q"1/2P'c/n>1/2 
n  n 

£  0  (1)1  IIQ-1/2P'e/nll  =  0  (K1/2/vn). 
p      n  p 

Let     0     be  as  in  Assumption  3.     Then  for     g  =  (gJx.) g0(x  ))',     by     1  P(P'P)~P' 

idempotent, 

1   IIQ_1P'  (g-P0)/nll  *  0  (1)1  [(g-P0)'P(P/pf1P'(g-P0)/n]1/2 

£  0  (l)[(g-P0)'(g-P0)/n]1/2  =  0  (K~a). 

Therefore,  by     1  (0  -  0)  =  1  Q_1P' (y-g)/n  +  1  Q_1P' (g-P0)/n,     it  follows  that 

1   110  -  011  s  1   IIQ^P'e/nll   +  1   IIQ-1P'(g-P0)/nll   =  0  (K1/2/vn  +  K~a).     (20) 
n  n  n  p 


Next,  by  the  triangle  inequality, 

lnX[g(x)-g()(x)]2dF(x)  =  lnJ"[pK(x)'(0-0)  +  pK(x)'0-go(x)]2dF(x)  (21) 

£  MI0-0II2  +  lnJ,[pK(x)'0-go(x)]2dF(x)  =  0  (K/n  +  K~2a)  +  0(K~2a)  =  0  (K/n  +  K~2a). 


Therefore,  the  first  conclusion  follows  by     1     =  1     with  probability  approaching  one. 
Also,  by  the  triangle  and  Cauchy-Schwartz  inequalities, 

lnlg  "  gQld  *  lnlpK'(0-0)ld  +   |pK'0-gold  s  Cd(K)ln  110-011   +  0(K"a)  (22) 

=  0  «  .(K)[(K1/2/vn)  +  K"a]). 
P    d 
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The  second  conclusion  follows  from  this  equation  like  the  first  follows  from  eq.   (21). 
QED. 

Proof  of  Theorem  2:     Note  that  Assumption  5  b)  implies 

rfi[<;d(K)(VR/Vn  +  K~a)]2  =  <(Cd(K)4K2/n)1/2  +  (•nK_a)2[Cd(K)4/n]1/2>  ->  0,     (23) 

Cd(K)2(v'R/vfi  +  K_a)  =  [Cd(K)4K/n]1/2  +  L[£  (K)4/n]1/2n'nk~a  -»  0. 

-1/2  1/2 

Let     F  =  V..         =  (A'EA)       .     Then  by  Cauchy-Schwartz  inequality,  for     g       as  in  Assumption 

~  2    1/2 

6,      |D(gK)|    =    |A'/3KI    ^   IIAIIII/3KII   =  IIAII(E[gK(xr])       ,     implying     II All  — >  ».     Further,     Z  s 

2 
CI     by     Var(ylx)     bounded  below  and     Q  =  I,  so  that  so  that     V     =  A'ZA  i  CHAN   ,     and 

hence 

|F|    £  C,      IIFAII2  =  tr(FA'AF)  £  tr(CFV„F)  =  C. 


Then,  for     1       as  in  the  proof  of  Theorem  1, 

1   IIFA'Q_1II   s  1   UFA' II  +  1   IIFA'(Q-I)Q_1II   s  C  +  0  (1) UFA' II  IIQ-III   =  0  (1),     (24) 
n  n  n  p  p 

1   IIFA'Q~1/2II2  s   UFA' II2  +  1  tr(FA' (Q-I)Q_1AF)  £  C  +  CIIFA' 111   UFA' Q_ill  IIQ-III   =  0  (1). 
n  n  n  p 

—                 K 
Now,  let     gK(x)  =  p   (x)'0       for     0       from  Assumption  3,     G  =  (g  (x  ) g  (x  ))', 

and     e  =  Y-G.     Then 

1  i/nV"1/2(e-9n)  =  1  v^F[a(i)-a(gn)]  =  1  <VnF[a(g)-a(gn)-D(g)+D(gn)]     (25) 

+  FA'P'c/Vn  +  v^FA'(Q_1-I)P'c/n  +  v^nFA'Q"1?' (G-PpK)/n  +  v/nF[D(gK)-D(gQ)]>. 

By  Assumptions  3  and  5     lnl^nF[D(g"K)-D(g0)]|   =s  i/nC|D(gK-g0)  I   *  Ci/nl^-g^  £  Cv/nK~a  ->  0. 


i'Q  Vi 


Also,  by  the  Cauchy-Schwartz  inequality  and  eq.  (24),     1    IVnFA'Q    P'(g-P0,J/n|   s 

n  N 


1   IIFA'Q  T'/v'nllllg-P/VI  s  1   IIFA'Q      Hlvfimax.^    |g.(x.)-g„(x.)  |    £  1   IIFA'Q         ll^n|g_-g„  |_  = 
n  iv  n  i^n     u     i       is.     i  n  u     iv  u 

(•nK_a)  =  o  ( 
P  P 


0  (v^TK  a)  =  o  (1).     Next,  let     X  =  (x, x  )     and     p.  =  pK(x.).     Then  by     E[e|X  ]  =  0, 

p  p  1  n  i  i  n 
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Etl   Hv'nFA'fQ^-nP'e/nl^lX  ]  =  1  tr(FA' (Q  1-I)[y\p.p'.(r2(x.)/n](Q  *-I)AF' )  * 
n  n  n  **i  1   1         1 

CI  tr(FA' (Q-1-I)Q(Q_1-I)AF)  £  CI  tr(FA' (Q-I)Q_1(Q-I)AF' )  ^  CIIFAII2IIQ-III2  -5- 
n  n 

Therefore,  since  this  conditional  expectation  is     o  (1),     it  follows  that 

1  llvfiFA'tQ^-DP'e/nll2  -^  0. 
n 


Next 


,  let     Z.     =  FA'p.e./Vn,     so  that    V.Z.     =  FA'P'e/Vn.     Note  that  for  each     n, 

m  ri  l  h  in 

Z.  ,  (i  =  1 n)     is  i.i.d..     Also,     E[Z.   ]  =  0,     F.EtZ2  ]  =  1,     and 

in  in  .^i       in 

nE[l(|Z.    |    >  e)Z2  ]  =  ne2E[l(lZ.  /e|    >  1)(Z.  /e)2]  £  ne2E[(Z.  /e)4]       (26) 
in  in  in  in  in 

£  ne2IIFA' ll4Cn(K)2E[llp.ll2E[e4|x.]]/n2e4  s  C<rt(K)2K/n  — >  0. 
0  l  11  ^0 

Then  by  the  Lindbergh-Feller  central  limit  theorem,     £.Z.     — »  N(0,1),     so     1  FAQ    P'  e//n 

— >  N(0,I)     by     1     -5->  1.     Finally,  by  Theorem  1,  Assumption  5,   and  eq.   (23), 

|vGF[a(g)-a(g0)-D(g)+D(g0)]|    *  L"/n(  I i-gQ I d)2  =  0  (Li/n[Cd(K)(v^/i/n+K_a)]2)  -£->  0.     Then  by 

eq.   (25)  and  the  triangle  inequality,     1  VnV,.       (0~0n)  — >  N(0,1),     so  that  the  first 

n       K  u 

conclusion  follows  by     (1-1   VnV~       (9-9„)  -^-»  0. 

n         K  0 

Next,  consider  case  a)  of  Assumption  5,  where     a(g)     is  linear  in     g.     Then     a(p    '|3) 
=  A'£,     so     A  =  A.     In  case  b)  of  Assumption  5,     <d(K)[VR/Vn  +  K~a]  =  [Cd(K)2K/n]1/2(l  + 
K        VnK     )  — >  0,     so  that  by  Theorem  1,      lg-gnl  ■  -£-»  0.     Therefore,  for  the     e     from 
Assumption  5  and    I      equal  to  the  indicator  function  for     lg-gnl  ,  <  e/2     and     1     =1, 
Prob(I     =  1)  — >  1.     Also,  let     J  =  (D(p    ;g) D(p      ;g))'.     By  Assumption  5,  for  any     £ 

II  IK  i\i\ 

such  that     |pK'/3-g|    <  e/2,     implying     lpK'3-gQl   <  e, 

In  |  a(pK'  0)-a(g)-J'  (0-0)  |  /II0-0II  =  Tn  |  a(pK' 0)-a(g)-D(pK/  0;g)+D(g;g)  I  /II0-0H 

s  TnC(|pK,p-£id)2/iip-pii  s  Ia;d(K)2ii0-0ii  -»  0 


as     0  — »  0,     so  that     A     exists  and  equals     J     when    1=1.     Therefore,     A     exists  with 
probability  approaching  one.     Furthermore,  by  linearity  of     D(g;g)     in     g,     Assumption  5, 
and  Theorem  1, 


24 


lnIIA-AII2  =  UA-A)' (A-A)  =  lnlD((A-A)'pK;i)-D((A-A)'pK;g0)|  (27) 

£  CTnl(A-A)'pK|dl£-g0ld  £  InCIIA-AIICd(K)li-g0ld. 


Eq.   (23)  and  dividing  through  by     II A-A II     gives     1   II A-A II   £  C<  (K)|g-g    k  = 

0  «J(K)2[v'R/vn  +  K~a])  -£-»  0.     Therefore,     I  IIFAII  £  I  IIFIIIIA-AII  +   IIFAII  =  0  (1),     and 
p    d  n  n  p 

similarly     I  IIFAQ-1II   =  0  (1). 
n  p 

Next,   let     h  =  I  Q_1AF     and     h  =  I  AF.     Then     llhll  =  0  (1)     and 
n  n  p 

llh-hll   £  I   IIFA'Q_1(I-Q)II   +  I   II F( A-A)' II   £  I   UFA' Q_1II  III-QII   +  I    |F|IIA-AII  -^  0. 
n  n  n  n 


Since     E  s  CI,     the  largest  eigenvalue  of     Z     is  bounded  above,  and  hence  by     h'Zh  =  1 

I   Ih'Zh  -  1|   =   Ih'Zh  -  h'Zhl    £  (h-h)'Z(h-h)  +   |2(h-h)'Zh|  (28) 

n 

£  Cllh-hll2  +  2[(h-h)'Z(h-h)]1/2[h'Zh]1/2  £  o  (1)  +  Cllh-hll  -^  0. 

P 

2  4 

Also,   let     Z  =  r.p.p'.e./n.     It  follows  by     E[e.|x.]     bounded,  and  an  argument  like  that 
^111    l  11 

for     IIQ-III  -£-»  0     from  Theorem  1,  that     IIZ-ZII  -E-»  0,     implying 

I   Ih'Zh  -  h'Zhl   =   |h'(Z-Z)h|   £  llhll2IIZ-ZII  =  0  (Do  (1)  -^  0.  (29) 

n  P       P 

Now,  let     A.  =  g0(x.)-£(x.).     Note  that     <0(K)[(K/n)1/2+K~a]  =  [<0(K)2K/n]1/2(l  + 
vnK_a/K1/2)  — >  0,     so  by  Theorem  1,     max.^    I  A.  I    £    lg-g0l0  =  0  (o(D)  -?->  0.     Also,   let     S 
=  n-1£.p.p'.  |e.|     and     S  =  E[p.p'.  |e.|]  =  E[p.p'.E[|e.  |  |x.]]  £  CQ  =  CI.     By     E[e2|x.] 
bounded  and  a  similar  argument  to     IIQ-III  -J-»  0,     it  follows  that     II S — S II  — ^-»  0. 
Therefore, 

I   IFVF  -  h'Zhl   =   |h'(Z-Z)h|   =   |n_1y.(n'p.)2(2c.A.+A2)|  (29) 

n  Hi  ill 

£  2max.      I  A.  |n_1V.(h'p.)2|e.  I  ]  +  max.      |A2|n"V.(h'p.)2  £  o  U)[h'  (S+Q)hl 
i£n     l        u\      *i        i  i^n     l         u\      *i  p 
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s  o  (l)[h'(S+Q-S-I)h]  +  o  (l)[h'(S+I)h]  £  o  (l)llhll2(IIS-SII  +  IIQ-III)  +  o  (l)llhll2  -^  0. 
P  P  P  P 


Then  by  the  triangle  inequality  and  eqs.  (28)-(30),  it  follows  that     1    |FVF-1|   -!->  0. 
Then  by     Probd     =  1)  — >  1,     F  V  -£->  1,     implying 

i/nv"1/2(e-e0)  =  VnF(e-e0)/(F2v)1/2  -U  n(o.i), 


giving  the  last  conclusion. 

2 

Finally,  to  show  the  first  conclusion,   it  suffices  to  show  that      |V    |   s  CC,(K)   , 

K.  u 

because  9  =  6.  +  (V*/2/v/n)vnF(9-9_)  =  9n  +  0  (V*/2/Vn).  Note  that  for  any  0,  the 

K  2  K 

Cauchy-Schwartz  inequality  implies      |p    '|3|      s  £  (K)II0II,     so  that     IIAII     =    |D(p    'A)|    £ 

C|pK,A|  .  s  CC.UO  IIAII.     Dividing  by     IIAII     then  gives     IIAII   *  CC.(K),     and  hence      |V„|    s 
CHAN2  s  CCd(K)2.     QED. 

Proof  of  Theorem  3:     By  the  Cramer-Wold  device  and  by  symmetry  of     V     and     V,   it  suffices 
to  prove  that  c'v/n(9-6fJ  — >  N(0,c'Vc)     and     c'Vc  -£-»  c'Vc     for  any  conformable  constant 
vector     c     with     llcll  =  1.     Furthermore,  for  the  functional     c'a(g),     Assumption  6  is 

satisfied  with     v(x)     replaced  by     c'v(x).     Therefore,   it  suffices  to  prove  the  result 

K  K  — 1  K 

for  scalar     Wx).     Let     vv  =  Ap   (x)  =  E[i>(x)p   (x)']Q    p   (x)     be  the  mean  square 

IX 

projection  of     v  =  v(x)     on  the  approximating  functions,  where  the     x     argument  is 

2  2  2 

dropped  for  notational  simplicity.     Then     V,.  =  E[vv<r  (x)]     by  Assumption  6,     E[[v-vv)  ]  s 

ix  K.  K. 

K  2 

E[{v-p   (x)' pv}  ]  — >  0.     Assumptions  1  and  the  Cauchy-Schwartz  inequality  then  give 

I VK-V I   *  E[|i>2-i>2|]  s  EUvK-v)Z]  +  2E[|v||i>K-v|] 
2    1/?  ?    1/? 

s  od)  +  2iEivnr'(Eiiv-vKr]r*  ->  o. 

Thus,     V     — ^->  V.     By     <r  (x)     bounded  away  from  zero  and     v(x)     non-zero,     V  >  0.     The 
K. 

-1/2 
proof  now  follows  exactly  as  in  the  proof  of  Theorem  2,  except  that     F  =  V       "     is 

—  i  /*y 
bounded  by     V  —>  l/W.     Therefore,  by  the  conclusion  of  Theorem  2,     ^(9-9^  = 
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V^/2v'nVK1/2(e-e0)  -i>  N(0,V).     Also,  as  in  the  proof  of  Theorem  2,     FV1/2  -i>  1,     so  that 
V        V        — ^->  1,     and  squaring  gives     V  -^-»  V.     QED. 

Proof  of  Theorem  4:     Let     P    (x)     be  obtained  by  transforming  the     x's     so  that  their 
support  is     [-1,1]     and  replacing  the  individual  powers  in  each  term  by  the  Jacobi 
polynomial  of  the  same  order  that  is  orthonormal  with  respect  to  the  uniform  distribution 

on     [-1,1],     which  comprises  a  nonsingular  linear  transformation  of     p   (x).     By  the 

K        K 
density  of     x     bounded  below,     CI  =s  E[P   (x)P   (x)' ].     Also,   it  follows  as  in  eqs.   (3.13) 

1/9  V 

and  (3.14)  of  Andrews  (1991)  that     maxk<K  X€X  |PkK(x) '    s  K      '     so  that     IIP   (x)"  ~  CK> 

2  3 

and  hence     C->(K)  K/n  £  CK  /n  — >  0.     Furthermore,  it  follows  by  Theorem  8  of  Lorentz 

(1986)  that  Assumption  3  is  satisfied  with     d  =  0     and     a  =  s/r,     so  the  conclusion 

follows  by  Theorem  1.     QED. 

Proof  of  Theorem  5:     It  follows  as  in  the  proof  of  Theorem  4  that  Assumptions  2  and  3  are 
satisfied.     To  show  Assumption  5,  note  that  by  Assumption  9  there  is  a  sequence  of 

uniformly  bounded  continuous  functions     g.(x)     such  that     |a(g.)|   >  e     for  all     J     and 

2 

E[g.(x)   ]  — »  0.     Since  power  series  can  approximate  a  continuous  function  in  the  supremum 

norm  on  a  compact  set,  there  is     (3         such  that     A       =  sup     ^.|g  (x)-|3    'p   (x)|   — »  0     as    K 

K.J  JK.  XGo.       J  K.J 

— »  oo     for  any  fixed     J.     Let     K(J)     be  a  monotonic  increasing  sequence  such  that     A„.  < 

1/J     for     K  a  K(J),     and  let     pv  =  pv.     for     K(J)  £  K  <  K(J+1),     p„  =  0     for     K  <  K(l), 

and     gK(x)  =  p   (x)'3K.     Then  by  the  triangle  inequality,      |a(g.J|   >  e     for  all     K  > 

2 

K(J  ),     J     >  1/e,     and     E[gv(x)  ]  — >  0,     so  Assumption  5  is  satisfied.     Then,  since     d  = 
c  c  K. 

0,     and  from  the  proof  of  Theorem  4,     Cn(K)  ^  CK,     so  for  nonlinear     a(g)     it  follows 
that     C0(K)  K  /n  s  CK  /n  — >  0     and,  since  Assumption  3  is  satisfied  with     a  =  s/r,     vftK 


=  VnST5/r 


QED. 


Proof  of  Theorem  6:     Follows  by  Theorem  3  similarly  to  the  proof  of  Theorem  5,  with 
Assumption  6  replacing  Assumption  9.     QED. 

Proof  of  Theorem  7:     It  follows  by  Lemma  A.  16  of  Newey  (1994a),  that  for     P   (x)     equal  to 
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the  products  of  normalized  B-splines  (e.g.   see  Powell,   1981)  multiplied  by  the  square 

K  1/2  ?  ? 

root  of  the  number  of  knots,      IIP    (x)ll  *  CK        =  CQ(K),     and  hence     <Q(K)  K/n  s  CK  /n  — 

0.     Furthermore,  it  follows  by  the  argument  of  Burman  and  Chen  (1989,   p.   1587),  that  the 

rest  of  Assumption  2  is  satisfied.     Also,  Assumption  3  with     d  =  0     and     a  =  s/r     follows 

by  Theorem  12.8  of  Schumaker  (1981),  so  the  conclusion  follows  from  Theorem  1.     QED. 

Proof  of  Theorem  8:     Follows  as  in  the  proof  of  Theorem  5. 

Proof  of  Theorem  9:     Follows  as  in  the  proof  of  Theorem  6. 
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